current situation
Validating Generative Agent-Based Models of Social Norm Enforcement: From Replication to Novel Predictions
Cross, Logan, Haber, Nick, Yamins, Daniel L. K.
As large language models (LLMs) advance, there is growing interest in using them to simulate human social behavior through generative agent-based modeling (GABM). However, validating these models remains a key challenge. We present a systematic two-stage validation approach using social dilemma paradigms from psychological literature, first identifying the cognitive components necessary for LLM agents to reproduce known human behaviors in mixed-motive settings from two landmark papers, then using the validated architecture to simulate novel conditions. Our model comparison of different cognitive architectures shows that both persona-based individual differences and theory of mind capabilities are essential for replicating third-party punishment (TPP) as a costly signal of trustworthiness. For the second study on public goods games, this architecture is able to replicate an increase in cooperation from the spread of reputational information through gossip. However, an additional strategic component is necessary to replicate the additional boost in cooperation rates in the condition that allows both ostracism and gossip. We then test novel predictions for each paper with our validated generative agents. We find that TPP rates significantly drop in settings where punishment is anonymous, yet a substantial amount of TPP persists, suggesting that both reputational and intrinsic moral motivations play a role in this behavior. For the second paper, we introduce a novel intervention and see that open discussion periods before rounds of the public goods game further increase contributions, allowing groups to develop social norms for cooperation. This work provides a framework for validating generative agent models while demonstrating their potential to generate novel and testable insights into human social behavior.
Feedback-Induced Performance Decline in LLM-Based Decision-Making
Yang, Xiao, Leitner, Juxi, Burke, Michael
The ability of Large Language Models (LLMs) to extract context from natural language problem descriptions naturally raises questions about their suitability in autonomous decision-making settings. This paper studies the behaviour of these models within a Markov Decision Process (MDPs). While traditional reinforcement learning (RL) strategies commonly employed in this setting rely on iterative exploration, LLMs, pre-trained on diverse datasets, offer the capability to leverage prior knowledge for faster adaptation. We investigate online structured prompting strategies in sequential decision making tasks, comparing the zero-shot performance of LLM-based approaches to that of classical RL methods. Our findings reveal that although LLMs demonstrate improved initial performance in simpler environments, they struggle with planning and reasoning in complex scenarios without fine-tuning or additional guidance. Our results show that feedback mechanisms, intended to improve decision-making, often introduce confusion, leading to diminished performance in intricate environments. These insights underscore the need for further exploration into hybrid strategies, fine-tuning, and advanced memory integration to enhance LLM-based decision-making capabilities.
What to Do Next? Memorizing skills from Egocentric Instructional Video
Learning to perform activities through demonstration requires extracting meaningful information about the environment from observations. In this research, we investigate the challenge of planning high-level goal-oriented actions in a simulation setting from an egocentric perspective. W e present a novel task, interactive action planning, and propose an approach that combines topological affordance memory with transformer architecture. The process of memorizing the environment's structure through extracting af-fordances facilitates selecting appropriate actions based on the context. Moreover, the memory model allows us to detect action deviations while accomplishing specific objectives. T o assess the method's versatility, we evaluate it in a realistic interactive simulation environment. Our experimental results demonstrate that the proposed approach learns meaningful representations, resulting in improved performance and robust when action deviations occur .
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
Laine, Rudolf, Chughtai, Bilal, Betley, Jan, Hariharan, Kaivalya, Scheurer, Jeremy, Balesni, Mikita, Hobbhahn, Marius, Meinke, Alexander, Evans, Owain
AI assistants such as ChatGPT are trained to respond to users by saying, "I am a large language model". This raises questions. Do such models know that they are LLMs and reliably act on this knowledge? Are they aware of their current circumstances, such as being deployed to the public? We refer to a model's knowledge of itself and its circumstances as situational awareness. To quantify situational awareness in LLMs, we introduce a range of behavioral tests, based on question answering and instruction following. These tests form the $\textbf{Situational Awareness Dataset (SAD)}$, a benchmark comprising 7 task categories and over 13,000 questions. The benchmark tests numerous abilities, including the capacity of LLMs to (i) recognize their own generated text, (ii) predict their own behavior, (iii) determine whether a prompt is from internal evaluation or real-world deployment, and (iv) follow instructions that depend on self-knowledge. We evaluate 16 LLMs on SAD, including both base (pretrained) and chat models. While all models perform better than chance, even the highest-scoring model (Claude 3 Opus) is far from a human baseline on certain tasks. We also observe that performance on SAD is only partially predicted by metrics of general knowledge (e.g. MMLU). Chat models, which are finetuned to serve as AI assistants, outperform their corresponding base models on SAD but not on general knowledge tasks. The purpose of SAD is to facilitate scientific understanding of situational awareness in LLMs by breaking it down into quantitative abilities. Situational awareness is important because it enhances a model's capacity for autonomous planning and action. While this has potential benefits for automation, it also introduces novel risks related to AI safety and control. Code and latest results available at https://situational-awareness-dataset.org .
Forewarn: Business growth with current situation of AI in Construction Market - DataScienceCentral.com
Today, AI in construction industry has become a common tool for carrying out many construction activities. In addition, many big companies in the construction industry all across the globe are immensely adopting AI as it boasts a multitude of applications. AI has the ability to accurately evaluate the cost overrun of a project, on the basis of factors such as type of contract, size, and also the level of competence of the managers to risk moderation via self-driving machinery and equipment.
Hariri
Recommender systems have become essential tools in many application areas as they help alleviate information overload by tailoring their recommendations to users' personal preferences. Users' interests in items, however, may change over time depending on their current situation. Without considering the current circumstances of a user, recommendations may match the general preferences of the user, but they may have small utility for the user in his/her current situation.We focus on designing systems that interact with the user over a number of iterations and at each step receive feedback from the user in the form of a reward or utility value for the recommended items. The goal of the system is to maximize the sum of obtained utilities over each interaction session. We use a multi-armed bandit strategy to model this online learning problem and we propose techniques for detecting changes in user preferences. The recommendations are then generated based on the most recent preferences of a user. Our evaluation results indicate that our method can improve the existing bandit algorithms by considering the sudden variations in the user's feedback behavior.
Pinaki Laskar on LinkedIn: #AGI #Sensors #AI
AI Researcher, Cognitive Technologist Inventor - AI Thinking, Think Chain Innovator - AIOT, XAI, Autonomous Cars, IIOT Founder Fisheyebox Spatial Computing Savant, Transformative Leader, Industry X.0 Practitioner Understanding #AGI architecture, includes effectors, which are understood as sensors and actuators. Effectors are a kind of ambivalent element, being controlled by the AGI system and at the same time being part of the natural or virtual embodiment. Both sensors and actuators imply "smart" devices/units that carry a two-way exchange of information with the AGI system itself. AGI sends commands to the effector and receives data in response. From the AGI point of view, the difference between sensors and actuators is that the purpose of sensors is to collect information about the current situation, and actuators are to change the situation.
Forewarn: Business growth with current situation of AI in Construction Market
Today, AI in construction industry has become a common tool for carrying out many construction activities. In addition, many big companies in the construction industry all across the globe are immensely adopting AI as it boasts a multitude of applications. AI has the ability to accurately evaluate the cost overrun of a project, on the basis of factors such as type of contract, size, and also the level of competence of the managers to risk moderation via self-driving machinery and equipment.
Clustering U.S counties by their COVID-19 curves
Analytics has become a part of everyone's daily routine as a result of the pandemic. Every day we look at curves of new cases, positivity rates, and a range of other metrics that give us insight into our current situation. One interesting metric used by the CDC, along with many news networks and publications is hotspot classification. A hotspot is a county or state where cases are currently increasing at a relatively high rate [1]. This metric is simple and easy to understand. But it leaves out a lot of interesting details.